144 ◾ Bioinformatics
remember that the consequence may also be beneficial in some cases. For instance, it has
been reported that truncating variants in CARD9, IL23R, and RNF186 proteins may protect
against Crohn’s disease and ulcerative colitis and also truncating variants in ANGPTL4,
APOC3, PCSK9, and LPA proteins may protect against coronary heart disease [9, 10]. The
impact of variants on a protein is also measured by the number of isoforms produced by
the affected gene, the percentage of the protein affected, and moreover, we should put into
consideration that a frameshift may be bypassed by splicing or its impact may be avoided
by another frameshift. In the attempt to annotate genetic variants, the SNVs effect can be
predicted with high accuracy followed by small InDels (1–50 bp) and then medium InDels
(50–100 bp). It is also easy to predict the effect of missense SNV. The variant annotation
tools have several approaches to predict the effect of missense variant. For instance, they
can take into consideration the physicochemical properties of amino acids, whether the
variant is in a conserved region or not, or does it affect the three-dimensional structure of
the protein. A variant in a region conserved across the species or in a region of a secondary
or tertiary structure is more likely to be deleterious. Some tools use homology modeling
to simulate the structure of the new protein to predict the effect of variants and other tools
use machine learning utilizing multiple features to annotate the variants with the right
information and consequences. Figure 4.8 and Table 4.3 show a segment of a eukaryotic
genomic gene and possible variant annotations in each region.
In a typical genome-wide variant study, thousands of variants may be discovered. The
significance of these variants varies based on the type, location, and possible consequence.
FIGURE 4.8 Variant effect on gene regions.
TABLE 4.3 Gene Regions and Variant Effect
Region
Variant Effect
(1) Regulatory region including transcription
factor (TF) binding site
Deleterious variants
(2) Upstream gene region
Intergenic variants/upstream gene variant
(3) 5′ UTR region
5′ UTR variant
(4) Transcription start site (TSS)
Start retained or start lost variants
(5) Exon region
Exonic variants include missense, nonsense (stop gained),
frameshift, inframe insertion or deletion
(6) Splice donor region
Splice-site variant (exon loss, intron inclusion, altered
protein-coding sequence)
(7) Splice acceptor region
Splice-site variant (exon loss, intron inclusion, altered
protein-coding sequence)
(8) Intron region
Intronic variant
(9) Transcription termination site (TTS)
Stop lost, stop retained, incomplete terminal codon
(10) 3′ UTR region
3′ UTR variant
(11) Downstream gene region
Downstream gene variant